44 research outputs found

    Probing the Informational and Regulatory Plasticity of a Transcription Factor DNA–Binding Domain

    Get PDF
    Transcription factors have two functional constraints on their evolution: (1) their binding sites must have enough information to be distinguishable from all other sequences in the genome, and (2) they must bind these sites with an affinity that appropriately modulates the rate of transcription. Since both are determined by the biophysical properties of the DNA–binding domain, selection on one will ultimately affect the other. We were interested in understanding how plastic the informational and regulatory properties of a transcription factor are and how transcription factors evolve to balance these constraints. To study this, we developed an in vivo selection system in Escherichia coli to identify variants of the helix-turn-helix transcription factor MarA that bind different sets of binding sites with varying degrees of degeneracy. Unlike previous in vitro methods used to identify novel DNA binders and to probe the plasticity of the binding domain, our selections were done within the context of the initiation complex, selecting for both specific binding within the genome and for a physiologically significant strength of interaction to maintain function of the factor. Using MITOMI, quantitative PCR, and a binding site fitness assay, we characterized the binding, function, and fitness of some of these variants. We observed that a large range of binding preferences, information contents, and activities could be accessed with a few mutations, suggesting that transcriptional regulatory networks are highly adaptable and expandable

    A reexamination of information theory-based methods for DNA-binding site identification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Searching for transcription factor binding sites in genome sequences is still an open problem in bioinformatics. Despite substantial progress, search methods based on information theory remain a standard in the field, even though the full validity of their underlying assumptions has only been tested in artificial settings. Here we use newly available data on transcription factors from different bacterial genomes to make a more thorough assessment of information theory-based search methods.</p> <p>Results</p> <p>Our results reveal that conventional benchmarking against artificial sequence data leads frequently to overestimation of search efficiency. In addition, we find that sequence information by itself is often inadequate and therefore must be complemented by other cues, such as curvature, in real genomes. Furthermore, results on skewed genomes show that methods integrating skew information, such as <it>Relative Entropy</it>, are not effective because their assumptions may not hold in real genomes. The evidence suggests that binding sites tend to evolve towards genomic skew, rather than against it, and to maintain their information content through increased conservation. Based on these results, we identify several misconceptions on information theory as applied to binding sites, such as negative entropy, and we propose a revised paradigm to explain the observed results.</p> <p>Conclusion</p> <p>We conclude that, among information theory-based methods, the most unassuming search methods perform, on average, better than any other alternatives, since heuristic corrections to these methods are prone to fail when working on real data. A reexamination of information content in binding sites reveals that information content is a compound measure of search and binding affinity requirements, a fact that has important repercussions for our understanding of binding site evolution.</p

    Is Thermosensing Property of RNA Thermometers Unique?

    Get PDF
    A large number of studies have been dedicated to identify the structural and sequence based features of RNA thermometers, mRNAs that regulate their translation initiation rate with temperature. It has been shown that the melting of the ribosome-binding site (RBS) plays a prominent role in this thermosensing process. However, little is known as to how widespread this melting phenomenon is as earlier studies on the subject have worked with a small sample of known RNA thermometers. We have developed a novel method of studying the melting of RNAs with temperature by computationally sampling the distribution of the RNA structures at various temperatures using the RNA folding software Vienna. In this study, we compared the thermosensing property of 100 randomly selected mRNAs and three well known thermometers - rpoH, ibpA and agsA sequences from E. coli. We also compared the rpoH sequences from 81 mesophilic proteobacteria. Although both rpoH and ibpA show a higher rate of melting at their RBS compared with the mean of non-thermometers, contrary to our expectations these higher rates are not significant. Surprisingly, we also do not find any significant differences between rpoH thermometers from other -proteobacteria and E. coli non-thermometers

    An iterative strategy combining biophysical criteria and duration hidden Markov models for structural predictions of Chlamydia trachomatis σ66 promoters

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Promoter identification is a first step in the quest to explain gene regulation in bacteria. It has been demonstrated that the initiation of bacterial transcription depends upon the stability and topology of DNA in the promoter region as well as the binding affinity between the RNA polymerase σ-factor and promoter. However, promoter prediction algorithms to date have not explicitly used an ensemble of these factors as predictors. In addition, most promoter models have been trained on data from <it>Escherichia coli</it>. Although it has been shown that transcriptional mechanisms are similar among various bacteria, it is quite possible that the differences between <it>Escherichia coli </it>and <it>Chlamydia trachomatis </it>are large enough to recommend an organism-specific modeling effort.</p> <p>Results</p> <p>Here we present an iterative stochastic model building procedure that combines such biophysical metrics as DNA stability, curvature, twist and stress-induced DNA duplex destabilization along with duration hidden Markov model parameters to model <it>Chlamydia trachomatis </it>σ<sup>66 </sup>promoters from 29 experimentally verified sequences. Initially, iterative duration hidden Markov modeling of the training set sequences provides a scoring algorithm for <it>Chlamydia trachomatis </it>RNA polymerase σ<sup>66</sup>/DNA binding. Subsequently, an iterative application of Stepwise Binary Logistic Regression selects multiple promoter predictors and deletes/replaces training set sequences to determine an optimal training set. The resulting model predicts the final training set with a high degree of accuracy and provides insights into the structure of the promoter region. Model based genome-wide predictions are provided so that optimal promoter candidates can be experimentally evaluated, and refined models developed. Co-predictions with three other algorithms are also supplied to enhance reliability.</p> <p>Conclusion</p> <p>This strategy and resulting model support the conjecture that DNA biophysical properties, along with RNA polymerase σ-factor/DNA binding collaboratively, contribute to a sequence's ability to promote transcription. This work provides a baseline model that can evolve as new <it>Chlamydia trachomatis </it>σ<sup>66 </sup>promoters are identified with assistance from the provided genome-wide predictions. The proposed methodology is ideal for organisms with few identified promoters and relatively small genomes.</p

    Compensatory Evolution of Gene Regulation in Response to Stress by Escherichia coli Lacking RpoS

    Get PDF
    The RpoS sigma factor protein of Escherichia coli RNA polymerase is the master transcriptional regulator of physiological responses to a variety of stresses. This stress response comes at the expense of scavenging for scarce resources, causing a trade-off between stress tolerance and nutrient acquisition. This trade-off favors non-functional rpoS alleles in nutrient-poor environments. We used experimental evolution to explore how natural selection modifies the regulatory network of strains lacking RpoS when they evolve in an osmotically stressful environment. We found that strains lacking RpoS adapt less variably, in terms of both fitness increase and changes in patterns of transcription, than strains with functional RpoS. This phenotypic uniformity was caused by the same adaptive mutation in every independent population: the insertion of IS10 into the promoter of the otsBA operon. OtsA and OtsB are required to synthesize the osmoprotectant trehalose, and transcription of otsBA requires RpoS in the wild-type genetic background. The evolved IS10 insertion rewires expression of otsBA from RpoS-dependent to RpoS-independent, allowing for partial restoration of wild-type response to osmotic stress. Our results show that the regulatory networks of bacteria can evolve new structures in ways that are both rapid and repeatable

    Leaderless genes in bacteria: clue to the evolution of translation initiation mechanisms in prokaryotes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Shine-Dalgarno (SD) signal has long been viewed as the dominant translation initiation signal in prokaryotes. Recently, leaderless genes, which lack 5'-untranslated regions (5'-UTR) on their mRNAs, have been shown abundant in archaea. However, current large-scale <it>in silico </it>analyses on initiation mechanisms in bacteria are mainly based on the SD-led initiation way, other than the leaderless one. The study of leaderless genes in bacteria remains open, which causes uncertain understanding of translation initiation mechanisms for prokaryotes.</p> <p>Results</p> <p>Here, we study signals in translation initiation regions of all genes over 953 bacterial and 72 archaeal genomes, then make an effort to construct an evolutionary scenario in view of leaderless genes in bacteria. With an algorithm designed to identify multi-signal in upstream regions of genes for a genome, we classify all genes into SD-led, TA-led and atypical genes according to the category of the most probable signal in their upstream sequences. Particularly, occurrence of TA-like signals about 10 bp upstream to translation initiation site (TIS) in bacteria most probably means leaderless genes.</p> <p>Conclusions</p> <p>Our analysis reveals that leaderless genes are totally widespread, although not dominant, in a variety of bacteria. Especially for <it>Actinobacteria </it>and <it>Deinococcus-Thermus</it>, more than twenty percent of genes are leaderless. Analyzed in closely related bacterial genomes, our results imply that the change of translation initiation mechanisms, which happens between the genes deriving from a common ancestor, is linearly dependent on the phylogenetic relationship. Analysis on the macroevolution of leaderless genes further shows that the proportion of leaderless genes in bacteria has a decreasing trend in evolution.</p

    Design Parameters to Control Synthetic Gene Expression in Escherichia coli

    Get PDF
    BACKGROUND:Production of proteins as therapeutic agents, research reagents and molecular tools frequently depends on expression in heterologous hosts. Synthetic genes are increasingly used for protein production because sequence information is easier to obtain than the corresponding physical DNA. Protein-coding sequences are commonly re-designed to enhance expression, but there are no experimentally supported design principles. PRINCIPAL FINDINGS:To identify sequence features that affect protein expression we synthesized and expressed in E. coli two sets of 40 genes encoding two commercially valuable proteins, a DNA polymerase and a single chain antibody. Genes differing only in synonymous codon usage expressed protein at levels ranging from undetectable to 30% of cellular protein. Using partial least squares regression we tested the correlation of protein production levels with parameters that have been reported to affect expression. We found that the amount of protein produced in E. coli was strongly dependent on the codons used to encode a subset of amino acids. Favorable codons were predominantly those read by tRNAs that are most highly charged during amino acid starvation, not codons that are most abundant in highly expressed E. coli proteins. Finally we confirmed the validity of our models by designing, synthesizing and testing new genes using codon biases predicted to perform well. CONCLUSION:The systematic analysis of gene design parameters shown in this study has allowed us to identify codon usage within a gene as a critical determinant of achievable protein expression levels in E. coli. We propose a biochemical basis for this, as well as design algorithms to ensure high protein production from synthetic genes. Replication of this methodology should allow similar design algorithms to be empirically derived for any expression system

    Genome-Wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli

    Get PDF
    Despite almost 40 years of molecular genetics research in Escherichia coli a major fraction of its Transcription Start Sites (TSSs) are still unknown, limiting therefore our understanding of the regulatory circuits that control gene expression in this model organism. RegulonDB (http://regulondb.ccg.unam.mx/) is aimed at integrating the genetic regulatory network of E. coli K12 as an entirely bioinformatic project up till now. In this work, we extended its aims by generating experimental data at a genome scale on TSSs, promoters and regulatory regions. We implemented a modified 5′ RACE protocol and an unbiased High Throughput Pyrosequencing Strategy (HTPS) that allowed us to map more than 1700 TSSs with high precision. From this collection, about 230 corresponded to previously reported TSSs, which helped us to benchmark both our methodologies and the accuracy of the previous mapping experiments. The other ca 1500 TSSs mapped belong to about 1000 different genes, many of them with no assigned function. We identified promoter sequences and type of σ factors that control the expression of about 80% of these genes. As expected, the housekeeping σ70 was the most common type of promoter, followed by σ38. The majority of the putative TSSs were located between 20 to 40 nucleotides from the translational start site. Putative regulatory binding sites for transcription factors were detected upstream of many TSSs. For a few transcripts, riboswitches and small RNAs were found. Several genes also had additional TSSs within the coding region. Unexpectedly, the HTPS experiments revealed extensive antisense transcription, probably for regulatory functions. The new information in RegulonDB, now with more than 2400 experimentally determined TSSs, strengthens the accuracy of promoter prediction, operon structure, and regulatory networks and provides valuable new information that will facilitate the understanding from a global perspective the complex and intricate regulatory network that operates in E. coli

    Identification and functional characterization of small non-coding RNAs in Xanthomonas oryzae pathovar oryzae

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Small non-coding RNAs (sRNAs) are regarded as important regulators in prokaryotes and play essential roles in diverse cellular processes. <it>Xanthomonas oryzae </it>pathovar <it>oryzae </it>(<it>Xoo</it>) is an important plant pathogenic bacterium which causes serious bacterial blight of rice. However, little is known about the number, genomic distribution and biological functions of sRNAs in <it>Xoo</it>.</p> <p>Results</p> <p>Here, we performed a systematic screen to identify sRNAs in the <it>Xoo </it>strain PXO99. A total of 850 putative non-coding RNA sequences originated from intergenic and gene antisense regions were identified by cloning, of which 63 were also identified as sRNA candidates by computational prediction, thus were considered as <it>Xoo </it>sRNA candidates. Northern blot hybridization confirmed the size and expression of 6 sRNA candidates and other 2 cloned small RNA sequences, which were then added to the sRNA candidate list. We further examined the expression profiles of the eight sRNAs in an <it>hfq </it>deletion mutant and found that two of them showed drastically decreased expression levels, and another exhibited an Hfq-dependent transcript processing pattern. Deletion mutants were obtained for seven of the Northern confirmed sRNAs, but none of them exhibited obvious phenotypes. Comparison of the proteomic differences between three of the ΔsRNA mutants and the wild-type strain by two-dimensional gel electrophoresis (2-DE) analysis showed that these sRNAs are involved in multiple physiological and biochemical processes.</p> <p>Conclusions</p> <p>We experimentally verified eight sRNAs in a genome-wide screen and uncovered three Hfq-dependent sRNAs in <it>Xoo</it>. Proteomics analysis revealed <it>Xoo </it>sRNAs may take part in various metabolic processes. Taken together, this work represents the first comprehensive screen and functional analysis of sRNAs in rice pathogenic bacteria and facilitates future studies on sRNA-mediated regulatory networks in this important phytopathogen.</p

    Modeling Structure-Function Relationships in Synthetic DNA Sequences using Attribute Grammars

    Get PDF
    Recognizing that certain biological functions can be associated with specific DNA sequences has led various fields of biology to adopt the notion of the genetic part. This concept provides a finer level of granularity than the traditional notion of the gene. However, a method of formally relating how a set of parts relates to a function has not yet emerged. Synthetic biology both demands such a formalism and provides an ideal setting for testing hypotheses about relationships between DNA sequences and phenotypes beyond the gene-centric methods used in genetics. Attribute grammars are used in computer science to translate the text of a program source code into the computational operations it represents. By associating attributes with parts, modifying the value of these attributes using rules that describe the structure of DNA sequences, and using a multi-pass compilation process, it is possible to translate DNA sequences into molecular interaction network models. These capabilities are illustrated by simple example grammars expressing how gene expression rates are dependent upon single or multiple parts. The translation process is validated by systematically generating, translating, and simulating the phenotype of all the sequences in the design space generated by a small library of genetic parts. Attribute grammars represent a flexible framework connecting parts with models of biological function. They will be instrumental for building mathematical models of libraries of genetic constructs synthesized to characterize the function of genetic parts. This formalism is also expected to provide a solid foundation for the development of computer assisted design applications for synthetic biology
    corecore